arXiv:1502.00190vl [math.NA] 1 Feb 2015 


Randomized Kaczmarz Algorithm for Inconsistent 
Linear Systems; An Exact MSE Analysis 

Chuang Wang^, Ameya Agaskar^’^ and Yue M. Lu^ 

^Harvard University, Cambridge, MA 02138, USA 
^MIT Lincoln Laboratory, Lexington, MA 02420, USA 
E-mail; { chuangwang, aagaskar,y uelu } @ seas .harvard, edu 


Abstract —We provide a complete characterization of the 
randomized Kaczmarz algorithm (RKA) for inconsistent linear 
systems. The Kaczmarz algorithm, known in some fields as 
the algebraic reconstruction technique, is a classical method 
for solving large-scale overdetermined linear systems through 
a sequence of projection operators; the randomized Kaczmarz 
algorithm is a recent proposal by Strohmer and Vershynin to 
randomize the sequence of projections in order to guarantee 
exponential convergence (in mean square) to the solutions. A 
flurry of work followed this development, with renewed Interest 
in the algorithm, its extensions, and various bounds on their 
performance. Earlier, we studied the special case of consistent 
linear systems and provided an exact formula for the mean 
squared error (MSE) in the value reconstructed by RKA, as well 
as a simple way to compute the exact decay rate of the error. 
In this work, we consider the case of inconsistent linear systems, 
which is a more relevant scenario for most applications. Eirst, by 
using a “lifting trick”, we derive an exact formula for the MSE 
given a fixed noise vector added to the measurements. Then we 
show how to average over the noise when it is drawn from a 
distribution with known first and second-order statistics. Elnally, 
we demonstrate the accuracy of our exact MSE formulas through 
numerical simulations, which also illustrate that previous upper 
bounds in the literature may be several orders of magnitude too 
high. 

Index Terms —Overdetermined linear systems, Kaczmarz Al¬ 
gorithm, randomized Kaczmarz algorithm 

1. Introduction 

The Kaczmarz algorithm JT] is a simple and popular it¬ 
erative method for solving large-scale overdetermined linear 
systems. Given a full-rank measurement matrix A € 
with m> n, we wish to recover a signal vector x € R” from 
its measurements y € R™, given by 


r = k mod m. Since affine subspaces are convex, this algorithm 
is a special case of the projection onto convex sets (POCS) 
algorithm El. 

Due to its simplicity, the Kaczmarz algorithm has been 
widely used in signal and image processing. It has long been 
observed by practitioners that its convergence rate depends on 
the ordering of the rows of A, and that choosing the row order 
at random can often lead to faster convergence a. Yet it was 
only recently that Strohmer and Vershynin first rigorously an¬ 
alyzed the randomized Kaczmarz algorithm (RKA) 0. They 
considered a convenient row-selection probability distribution; 
choosing row i with probability proportional to its squared 
norm ||ai|p, and proved the following upper bound on the 
mean squared error (MSE) of the RKA at the kth iteration; 

E - xf < (l - -xf, (2) 

where ka ||A||i7’||A“^||2 is related to the condition number 
of A, and A~^ is its left-inverse. This bound guarantees that 
the MSE decays exponentially as the RKA iterations proceed. 

The work of Strohmer and Vershynin spurred a great deal 
of interest in RKA and its various extensions (see, e.g., 
0-1II1). In particular, Needell 16] derived a bound for the 
performance of the algorithm when the underlying linear 
system is inconsistent. In this case, the measurements are 

y = Ax + r], (3) 

where rj is an additive noise vector. NeedelTs bound was later 
improved by Zouzias and Ereris fS], who proved the following 
upper bound on the MSE; 


y = Ax. (1) 

Each row of A describes a single linear measurement yi = 
aj X, and the set of all signals x that satisfy that equation is 
an (n-1)-dimensional (affine) subspace in R". The Kaczmarz 
algorithm begins with an arbitrary initial guess x^'^\ then 
cycles through all the rows, projecting the iterand onto 

the subspace {a; € R" : a^x = yr} to obtain x^^\ where 

The Lincoln Laboratory portion of this work was sponsored by the 
Department of the Air Force under Air Force Contract #FA8721-05-C-0002. 
Opinions, interpretations, conclusions and recommendations are those of the 
authors and are not necessarily endorsed by the United States Government. 

C. Wang and Y. M. Lu were supported in part by the U.S. National Science 
Foundation under Grant CCF-1319140. 


-xf <{l-K a)'' 


\\X 


( 0 ) 


■ + 




(4) 


where crmin(A) is the smallest singular value of A. Note that 
this bound is equal to the original noiseless bound in (|2]l plus 
an extra term proportional to the total squared error in the 
measurements. 

In this paper, we provide a complete characterization of 
the randomized Kaczmarz algorithm for inconsistent linear 
systems given in Q. We show in Section HJ how to compute 
the exact MSE of the algorithm (averaging over the random 
choices of the rows) at each iteration. This extends our earlier 
results derived for the special case when the measurements 



are noiseless 113. A key ingredient of our derivation is a 
“lifting trick”, which allows us to analyze the evolution of 
the MSE (a quadratic quantity) through a much simpler linear 
recursion embedded in a higher-dimensional “lifted” space. 
We show that existing upper bounds in the literature 0 , a 
can be easily derived from our exact MSE formula. By setting 
the number of iterations to infinity, we also provide a closed- 
form expression for the limiting MSE {i.e., error floor) of the 
algorithm. 

Our MSE analysis in Section |II] is conditioned on the noise, 
i.e., we assume that 77 in Q is a fixed and deterministic 
vector. Thus, the resulting expressions for the MSE and the 
error floor depend on rj. In practice, the measurement noise r/ 
is unknown but its elements can often be modeled as zero- 
mean i.i.d. random variables drawn from some probability 
distributions {e.g., Gaussian random variables.) We consider 
this setting in Section [111] where we compute the MSE with 
the expectations taken over two sources of randomness: the 
random row-selections made by the algorithm and the noise 
vector. In this case the final MSE has two terms: one equals 
to that of the noiseless case we analyzed in lfT3 and an extra 
term proportional to the noise variance. 

We demonstrate the accuracy of our exact MSE formulas 
through numerical simulations reported in Section |IV] These 
empirical results also illustrate that previous upper bounds in 
the literature may be several orders of magnitude too high over 
the true performance of the algorithm. 

II. Exact Performance Analysis of RKA 
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TABLE I: List of important notation and variables. 

simplified through this choice, we fix A and allow the pi to 
be chosen arbitrarily. Our only restriction is that pi> 0 for all 
i so that every measurement is used. 

In 113, we computed the exact MSE of the RKA for the 
special case of consistent linear systems {i.e., the noiseless 
case.) In what follows, we extend our earlier result and analyze 
the more general inconsistent case. 

B. Exact MSE Analysis Using Lifting 

We consider a given measurement matrix A and a fixed 
noise vector rj. The exact MSE at iteration step k over the 
randomness of the algorithm are formulated in this section. 

To lighten the notation, we define the normalized jth row 
vector of A as 2^ e R” and let fii These and 

other important definitions are summarized in Table |I] for the 
the reader’s convenience. By combining Q and ([3, the error 
vector - x can be expressed as 


A. Overview and Notation 


Consider an inconsistent linear system as in ([3. Given the 
measurement y and the matrix A, the randomized Kaczmarz 
algorithm seeks to (approximately) reconstruct the unknown 
signal X through iterative projections. The iterand x^^'> e R." 
is initialized arbitrarily. At the fcth step, a row ik is chosen 
at random; row i is chosen with probability pi. The update 
equation is 




+ 
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where is the *th row of A. The intuition behind the 
algorithm is simple. In the noiseless case (i.e., when 77 = 0), 
each row of A and its corresponding entry in y defines an 
affine subspace on which the solution x must lie; at each 
iteration, the RKA algorithm randomly selects one of these 
subspaces and projects the iterand onto it, getting closer to 
the true solution with each step. 

The row-selection probabilities Pi are tunable parameters 
of the algorithm. Other authors 111, @, IS) have fixed the 
probabilities to be pi = ||ai|p/||A|||,. This is not really a 
restriction, since the rows of A can be scaled arbitrarily and, as 
long as the measurements are scaled appropriately, the solution 
and the algorithm’s iterations do not change. This particular 
choice, though, is convenient because it leads to simplified 
bounds, allowing them to be written in terms of a (modified) 
condition number of A. Since most of our expressions are not 


= Pi 


+ CbikBik ^ 
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where ‘5?*^ I - aiaj is the projection onto the (n - 1)- 
dimensional subspace orthogonal to the jth row 2^. Averaging 
over the randomness of the algorithm, we get an iterative 
equation of the mean error vector 

Ez^'^^ = PEz^^-^K f, (7) 


where P =^EPi 


and / “Eoipj. 


Note that the ease with which we can obtain © from 
is mainly due to the linearity of the original random 
recursion in ®. However, the quantity we are interested in, 
the mean-squared error E , is a non-linear (quadratic) 

term. To compute it, we “lift” the problem by treating the 
covariance matrix Ez^^\z^'^'>)'’" as an -dimensional vector 
whose dynamics are determined by the algorithm. In the lifted 
space, the dynamics are still linear (as we shall soon see), 
thus allowing for a relatively simple analysis. The MSE can 
be easily obtained as the trace of Ez^^\z^'^'>)'’". 

Consider the fcth iteration: 
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The linearity of this expression will be clearer if we “vector¬ 
ize” by vertically concatenating its columns to 

form a vector vec( 2 :^^^( 2 :^^^)^) e R" . In what follows, we 








will make use of the following matrix identity which holds for 
any matrices dimensioned so that ABC is well-defined: 

vec(ABC) = (C^® A)vec(B), (9) 

where ® represents the Kronecker matrix product ini. First, 


we note that 


vec( 2 : 






where is introduced as a shorthand notation for the 
Kronecker product of a vector v and itself. Then, we can apply 
the identity (|9]l to the right hand side of (l8]l to obtain 
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Taking expectation on both sides of the equation over the 
randomness of the algorithm, we obtain a simple iterative 
formula for the second-moment matrix: 


'n) 
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where Q = E {Pj ® Pf), D = Erj, (P 
and e Epf af 

We can combine (|7]i and (fTOl i into a single linear recursion 
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We thus have the following proposition: 


Proposition 1. For a fixed noise vector rj, and an initial error 
vector the MSE of RKA at the kth iteration is given by 
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where V 2 = {I - P)~^ f and Vi = {I - Q)~^ [e + Dv 2 ]. 

Proof: We hrst solve the linear recursion (fTTT i to get a 
closed-form expression 

/E[2(fe)]®2\ , /E[2(0)]®2^ 
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that depends on the initial error z^^\ Using the identity 


= (I - HY{I - and noting that 
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we can simplify (fl4l l and get 
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Meanwhile, using ® and the fact that {A ® ® 

P^), the MSE can be expressed in terms of the vectorized 
second-moment matrix as 


E 




= vec(J„)^(E[^l'=l]®2). (16) 


Combining this with (flST l yields the desired result. 


C. The Limiting MSE 

With the iteration number k going to infinity, the MSE in 
(fOl) will converge to a limiting value {i.e., an error floor) that 
only depends on the error vector rj. To see this, we first note 
that both P and Q are positive semideflnite matrices by their 
constructions. In fact, one can show that 

0 < Ainax (Q) ^ Amax (P) < 1, 

where Ai„ax(') is the largest eigenvalue of matrix. Eurther- 
more, one can show that the set of eigenvalues of H defined 
in (fTSI) is the union of those of P and Q, and that H is a 
contraction mapping, with limfc^oo = 0. 

This contraction property of H implies that the first term 
in the right-hand side of (foT l vanishes as k goes to infinity. 
It follows that the limiting MSE can be characterized through 
Vi as follows. 

Proposition 2. The limiting MSE is given by 

E|| 2 :^°°^|| =trmat(ui), (17) 

where the mat operator undoes the vec operator to produce 
an nxn matrix from an -dimensional vector. 

Due to the space limit, we omit the proof of these asser¬ 
tions, which involve elementary matrix analysis and will be 
presented in a follow-up paper. 

D. Previous Upper Bounds on the MSE 

The existing bounds on noisy Kaczmarz performance 
ISl can be recovered via our formulation. Prom (fTTI) . we have 

E = vec(/)^Qvec(Ez('=-^)(z^'=“^))^) 

(lo) 

+ vec(J)^I3Ez('="i) +vec(I)^e. 

Using the definition of Q, we have 

vec(I)^Q = ® P,")vec(I) 

= vec{Y,P^PtPY^ 

i 

= vec(P)^, 

where the second equality can be obtained from the identity 
(|9]l and the last equality follows from Pf being an idempotent 
matrix and from the definition of P. It follows that the first 
term on the right-hand side of (fT^ can be bounded as follows: 

vec(/)^Q vec 

= vec(P)^ vec {Ez^’^-^'> 

The second term on the right-hand side of (fTsT i is 0, since 
vec{I) = [vec(Piai) + vec(af P^)] = 0. 

i 

The third term is given by 

vec(I)^e = vec(7)^ Y^pipfaf^ = Y^Pipf. 




So, all together, we have 

E < A„,ax(P)E 

i 

Applying this inequality recursively gives us a bound equiva¬ 
lent to that in Zouzias and Freris JD: 

E < aLx(^)e ii^^°)ir+ ^ 

-L ) 

Remark 1. In Section WV\ our simulation results will illustrate 
that this upper bound may be several orders of magnitude too 
high than the true performance of the algorithm. 


III. Average over the noise 

Our exact MSB expression given in Proposition [T] depends 
on the noise vector rj. In practice, of course, 77 is unknown, but 
we may have information about its statistics. In this section, 
we suppose that rj is drawn from a probability distribution; 
in particular, we assume that its elements rji are i.i.d. random 
variables with zero-mean and variance tr^. Here, it is impor¬ 
tant to differentiate between two sources of randomness; the 
random row-selections made by the algorithm and the random 
vector rj. In what follows, E is understood as the conditional 
expectation operator over the randomness of the algorithm, 
with rj fixed, and we define E.^ as the average over the noise. 

It is convenient to rewrite (fOl l as 

E = vec(J„)^ [Q'= _ V,) 

+fkiD) ( 2 ;^°) -'*^ 2 )] +trmat(t!i) , 

where 

fkiD)= Y, (21) 

Since fk{D) is a linear function, we have E.^/fe(£)) = 
/fe(E^£)) = 0. Averaging I® over the noise, we get the 
following proposition. 


Proposition 3. The MSB of RKA at the kth iteration averaged 
over both the randomness of the algorithm and noise is 


EjjE I 


Afc)ir - 


= vec(7„)^[g"([2(°)]®2-E^-ui) 

-E^/fe(JD)t; 2 ] -i-trmat(E.^t;i). 


( 22 ) 


This formula involves two noise-related quantities, E.^ 7 ;i 
and E^ fk{D)v 2 , both of which are second-order in the noise. 
This shows that our knowledge of the second-order statistics 
of the noise is sufficient to compute them. In particular, the 
first term is given by 


Et^ Vi 




Ek 



+ p((7-P)-i) 


where we define the matrix function 

9{M) = YT^{a^®P■+P■®a,)Ma,. (23) 


(In these expressions the extra factors of ||ai|p are not 
erroneous—they account for the varying signal-to-noise ratio 
of the measurements.) 


The second noise-related term is computed by 

^r,fk{D)v2= E Q^^r,DP^-^-\l-P)-^f 
0<i<k 

= a^ Y Q'E^p(P'=-i-'(7-P)-i). 
o<r<fc 

Remark 2. The first term on the right-hand side of (I221 l 
decays exponentially because Amax(Q) < 1- Thus, the limiting 
MSB averaged over both the randomness of the algorithm and 
noise is E^E || 2 ;^°“^|| = trmat(Ej,t;i). 

IV. Experimental Results 

We verified our results with numerical simulations. We took 
care in our implementations of the matrices Q and D in order 
to minimize the time- and space-complexity. Q is an x 
matrix, which in a naive implementation would require 
O(n^) storage and O(n^) computation to multiply by a vector. 
Instead, we use the fact that 

Qx = vec mat(a;)P,^ j (24) 

to implement multiplication by Q with no additional storage in 
time 0{mn^) [since Pf can be multiplied by other matrices 
in time Meanwhile, we use the fact that 

Dx = vec ^{P^.xaJ + a.x'^P^)^ (25) 

to implement D without any additional storage. This saves no 
computation, since it takes 0{rrm?) time. 

For the noise-averaged formula (l22l i. we can use the struc¬ 
ture of Q to compute the complex term fk{D)v 2 in 
0{kmnf) time. Alternatively, we could use an eigenvector 
decomposition of P and Q to compute it in a time constant 
in k: we must use 0{n*) space and 0{mn^) time. This would 
make sense if we wanted to compute the MSB for a single, 
moderately large k. 

The results of two experiments are shown in this paper. 
First, we tested the fixed noise formula (HI. We drew a 
single noise vector rj with ||j 7 |p = 1 . 6 , and a starting error 
and choose a 150 x 50 measurement matrix A that had 
i.i.d. Gaussian entries. Then we ran 1007 separate trials of 
the randomized Kaczmarz algorithm, with each trial running 
for 2000 iterations and starting with an error vector We 
plotted the average MSE of the trials at each iteration on a log 
scale. The results are shown in Figure |l(a)| and show that the 
expression we derived in ([T3T l matches the numerical results 
very well. We also plotted existing bounds as well. 

The bounds are significantly higher than the true MSE. 

Next, we tested the noise-averaged formula (l22li . We used 
the AIR Tools package in MATFAB llT4l to generate a 
tomography measurement matrix A of size 148 x 100. The 
noise vector rj had i.i.d. entries with variance cr^ = 2.25 x lO”"* 
and was drawn independently for each trial. We ran 1007 
separate trials of the randomized Kaczmarz algorithm, with 
each trial running for 3000 iterations. The results are shown in 
Figure [T(b)| The close match between empirical and theoretical 
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Fig. 1; (a) The mean squared error - £c|p is shown on a logarithmic scale as a function of the iteration number k. The 

matrix A has Gaussian entries, and the error vector r] is hxed in advance, with Ijr^lp = 1.6. The average results from 1007 
trials are shown as the blue curve, and the results from 150 of the trials are shown in gray. The analytical expression (fOT l is 
shown as a dashed green line, and clearly matches the simulation results quite well. The Needed @ and Zouzias-Freris ll8l 
bounds are shown as well, and are far higher than the true MSE. (b) The mean square error - a;|p averaged over 

both the algorithm’s randomness and the noise is shown on a logarithmic scale as a function of the iteration number k. The 
matrix A is the measurement matrix of a tomographic system (generated by the AIR Tools package iflTll '). and the error vector 
r; is a zero mean Gaussian vector with variance 2.25 x 10“^, drawn independently with each trial. The average of 1007 trials 
are shown in blue along with the results from 150 of the trials in gray. The analytical expression for the Gaussian noise case 
(I22I 1 clearly matches the simulation results. The noise-averaged Zouzias-Freris bound is shown as well for comparison. 


curves verify our expression for the noise-averaged MSE (l22l l. 
The graph also shows that the noise-averaged version of the 
Zouzias-Ereris bound is more than two orders of magnitude 
higher than the true limiting MSE in this case. 

V. Conclusions 

We provided a complete characterization of the random¬ 
ized Kaczmarz algorithm when applied to inconsistent linear 
systems. We developed an exact formula for the MSE of the 
algorithm when the measurement vector is corrupted by a hxed 
noise vector. We also showed how to average this expression 
over a noise distribution with known hrst and second-order 
moments. We described efficient numerical implementations 
of these expressions that limit the time- and space-complexity. 
Simulations show that the exact MSE expressions we derived 
have excellent matches with the numerical results. Moreover, 
our experiments indicate that existing upper bounds on the 
MSE may be loose by several orders of magnitude. 
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